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Abstract 

We consider maximum likelihood estimation of finite mixture of uniform distri- 
butions. We prove that maximum likelihood estimator is strongly consistent, if the 
scale parameters of the component uniform distributions are restricted from below 
by exp(— n d ), < d < 1, where n is the sample size. 

Key words and phrases: Mixture distribution, maximum likelihood estimator, consis- 
tency. 

1 Introduction 

Consider a mixture of two uniform distributions 

(1 - a)fi(x] ai, &i) + af 2 (x; a 2 , b 2 ), 

where f m (x; a m , b m ), m = 1, 2, are uniform densities with parameter (a m , b m ) on the half- 
open intervals [a m — b m ,a m + b m ) and < a < 1. For defmiteness and convenience we 
use the half-open intervals in this paper, although obviously the intervals can be open or 
closed. By using half-open intervals, our densities are right continuous and the version of 
the density is uniquely determined. For simplicity suppose that a\ = 1/2, bi = 1/2, a = a 
are known and the parameter space is 

{(a 2 , b 2 ) | < a 2 - b 2 , a 2 + b 2 < 1} 

so that the support of the density is [0, 1). Let x\, . . . , x n denote a random sample of size 
n > 2 from the true density (1 — ao)fi(x; 1/2, 1/2) + aof 2 (x; a 2> o, b 2t o). If we set a 2 = x\, 
then likelihood tends to infinity as b 2 —>■ (Figure P). Hence the maximum likelihood 
estimator is not consistent. Actually it does not even exist for each finite n. 
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Figure 1: The likelihood tends to infinity as 62 ~ ^ at — x\. 



When we restrict that 62 > c, where c is a positive real constant, then we can avoid the 
divergence of the likelihood and the maximum likelihood estimator is strongly consistent 
provided that 62,0 > c - But there is a problem of how small we have to choose c to 
ensure 62,0 — c since we do not know 62,0- An interesting question here is whether we can 
decrease the bound c = c n to zero with the sample size n and yet guarantee the strong 
consistency of maximum likelihood estimator. If this is possible, the further question is 
how fast c n can decrease to zero. This question is similar to the (so far open) problem 
stated in Hathaway(1985), which treats mixtures of normal distributions with constraints 
imposed on the ratios of variances. See also a discussion in Section 3.8 of McLachlan and 
Peel(2000). 

Figure El depicts an example of likelihood function. Random sample of size n = 40 is 
generated from 0.6 • f(x; 0.5, 0.5) + 0.4 • f(x; 0.6, 0.2) and the model is 0.6 • f(x; 0.5, 0.5) + 
0.4- f(x; a, b). Despite the limited resolution in FigureEl, there are actually n = 40 peaks 
of the likelihood function as b j 0. We see that although the likelihood function diverges to 
infinity at these peaks, the divergence takes place only for very small b and the likelihood 
function is well-behaved for most of the ranges of b. This suggests that the bound c n can 
decrease to zero fairly quickly while maintaining the consistency of maximum likelihood 
estimator. In fact we prove that c n can decrease exponentially fast to zero for the mixture 
of M uniform distributions. More precisely we prove that maximum likelihood estimator 
is strongly consistent if c n = exp(— n d ), < d < 1. 

The organization of the paper is as follows. In Section |2] we summarize some prelimi- 
nary results. In Section El we state our main result in Theorem 13.11 Proof of Theorem 13.11 
is given in Appendix El In Section 0] we give a simulation result and some discussions. 



2 Preliminaries on identifiability of mixture distribu- 
tions and strong consistency 

In this section, we consider the identifiability and strong consistency of finite mixtures. 
The properties of finite mixtures treated in this section concerns general finite mixture 
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Log Likelihood 




Figure 2: An example of log likelihood function for n = 40 

distributions. 

A mixture of M densities with parameter 9 = (a±, r]i, . . . , o:m, Vm) is defined by 

M 

f(x; 9) = a mfm(x] Tim), 

m=l 

where a m , m — 1, . . . , M, called the mixing weights, are nonnegative real numbers that 
sum to one and f m {x;r] m ) are densities with parameter i] m . f m {x;rj m ) are called the 
components of the mixture. Let G denote the parameter space. 

In general, identifiability of a parametric family of densities is defined as follows. Note 
that in this paper a version of the density is uniquely determined by the right continuity. 

Definition 2.1. {identifiability of a parametric family of densities) 

A parametric family of densities {f(x; 9) \ 9 e 0} is identifiable if different values of 
parameter designate different densities; that is 

f(x;9) = f(x;9') Vx, 

implies 9 = 9'. 

If a parametric family of densities is not identifiable, then it is said to be unidentifiable. 

In mixture case, when all components f m (x; r] m ) , m — 1, . . . , M belong to the same 
parametric family, then f(x; 9) is invariant under the permutations of the component la- 
bels. Because of this trivial unidentifiability, the definition of identifiability for the mixture 
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densities can be weakened as described in Teicher(1960), Yakowitz and Spragins(1968), 

McLachlan and Peel(2000) and SO On, SO that ^2 m=1 Oi m f m (x] 1] m ) — ^ TO '=1 a 'm'fm'{ x 'i V'm') 

implies M = M' and for each m there exists some m' such that a m = a m i and r\ m = r]' m ,. 
But, even under such a weakened definition, mixtures of density functions still have 
unidentifiability. For example, if a.\ = 0, then for all parameters which differ only in 
771 , we have the same density. We also discuss examples of non-trivial unidentifiability of 
mixtures after theorem 13. II below. In any way, mixture model is unidentifiable. 

In unidentifiable case, true model may consist of two or more points in the parameter 
space. Therefore we have to carefully define strong consistency of estimator 9 n , because 
we should define 9 n to be consistent if 9 n falls in arbitrary small neighborhood of the set 
of points designating the true model as n — ► 00. 

The following definition is essentially the same as Redner's(1981). We suppose that 
the parameter space is a subset of Euclidean space and dist(6*, 9') denotes the Euclidean 
distance between 9, 9' G 0. 

Definition 2.2. (strongly consistent estimator) 
Let To denote the set of true parameters 

T = {9 G I f(x;9) = f(x;9 ) Vx}, 

where 9q is one of parameters designating the true distribution. An estimator 9 n is strongly 
consistent if 

Prob ( lim inf dist((L 9) = ) =1. 

yn^oo 8&T J 

In this paper two notations Prob(A) = 1 and A, a.e. (A holds almost everywhere), will 
be used interchangeably. The index to the parameter always denotes the true parameter. 

In finite mixture case, regularity conditions for strong consistency of maximum likeli- 
hood estimator are given in Redner(1981). When the components of the mixture are the 
densities of continuous distributions and the parameter space is Euclidean, the conditions 
become as follows. Let T denote a subset of the parameter space. 

Condition 1. T is a compact subset of Euclidean space. 

For 9 G T and any positive real number r, let 

f(x;9,r) = sup f(x;9'), 

dist(0',0)<r 

f*(x;9,r) = max(l,/(x;6',r)) . 



Condition 2. For each 9 G T and sufficiently small r, f(x; 9,r) is measurable and 
(2.1) [ log(f*(x;9,r))f(x;9 )dx<oo. 
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Condition 3. If Hindoo 6 n = 6, then lim^oo f(x; 8 n ) = f(x; 6) except on a set which is a 
null set and does not depend on the sequence {9 n }™ =1 . 

Condition 4. 

(2.2) / \\ogf(x;6 )\f(x;6 )dx<oo. 



The following two theorems have been proved by Wald(1949), Redner(1981). 

Theorem 2.1. (Wald(1949), Redner(1981)) Suppose that Conditions 1, 2, 3 and 4 are 

satisfied. Let S be any closed subset ofT not intersecting T Q . Then 

(2.3) Prob ( lim ^/(^ ^) x - x f(x ; 6) = \ = ^ 

\n->oo f( Xl ; B ) x • • • x f{x n ; 6 ) J 



Theorem 2.2. (Wald(1949), Redner(1981)) Let 6 n be any function of the observations 
xi, . . . , x n such that 

Vn, n^4r>5>o, 

then Prob(lim n _^ 00 mig e T dist(6' ri , 0)) = 1. 

If Conditions 1, 2, 3 and 4 are satisfied, then it is readily verified by theorems 12. II and 
12.21 that maximum likelihood estimator restricted to T is strongly consistent. 

We also state Okamoto's inequality which will be used in our proof in Appendix 1X1 

Theorem 2.3. (Okamoto(1958)) Let Z be a random variable following a binomial distri- 
bution Bm(n,p). Then for 5 > 

(2.4) Prob - p > < exp (-2n5 2 ). 

3 Main result 

Here, we generalize the problem stated in introduction to the problem of mixture of 
M uniform distributions and then state our main theorem. 

A mixture of M uniform densities with parameter 9 is defined by 

M 

f(x; 6) = y^ a m f m (x; rj m ), 

m=l 
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where f m (x;r) m ) = f m (x; a m , b m ), m = 1,...,M, are uniform densities with parameter 
Vm = ( a m, b m ) on half-open intervals [a m — b m , a m + b m ) and a m are mixing weights. The 
parameter space C K 3M is defined by 

AI 

= {(«i, a 1 ,b 1 ,..., a M , a M , b M ) I < a u . . . , a M < 1 , ) a m =l, b u . . . , b M > 0} . 

m=l 

Let 6*o = (ao,i, ao,i5 ^o,i, • • • , cto,M, oo,m, &o,m) be the true parameter and let 

M 



f(x; O ) = ^2 a °' m ^ m ( x ' a °. m ' 6 ° 

m=l 

be the true density. Denote the minimum and the maximum of the support of f(x; 6q) by 

£min = rmn(a 0i i — &o,i, . . . , ao,M — &o,m), 
imax = max(a ,i + &o,i> . . . , a 0) M + &o,a/), 

and let 

Let C be a constrained parameter space 

C = {0 G | 6 m > c> , m = 1, . . . , M}, 

where c is a positive real constant. We can easily see that Conditions 1, 2, 3 and 4 are 
satisfied with C . Therefore if 0o G C , then maximum likelihood estimator restricted to 
C is strongly consistent (Redner(1981)). But there is a problem of how small c must be 
to ensure O G C as discussed in section [T] 

Since the support of uniform density is compact, the following lemma holds. 

Lemma 3.1. For any parameter = (ai, a±, b%, . . . , atM, a^, &m) G 0, there exists a 
parameter 9' = (a±, a[, b[, . . . , %, a' M , b' M ) G satisfying 

Lmm < cl'ii ■ ■ ■ i o! M < L max , < b 1 , . . . , b' M < L 

swc/i that 

M M 

^ ^ C^mfmiXi Q- m ; ^m) — ^ , ^mfmi^X, O m , & m )j G [-t'minj -^max); 

m=l m=l 

where equality does not hold if there exists a m > such that a m SjL [X m in, L max ] or b m > L. 

By lemma l3~T| maximum likelihood estimator is restricted to a bounded set in C 
R 3M . 

Let {c n }^ =0 be a monotone decreasing sequence of positive real numbers converging 
to zero and define n by 

0„ = {0 g | < c n < b m , m = 1, . . . , M} . 

We are now ready to state our main theorem. 
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Theorem 3.1. Suppose that the true model f(x;9o) can not be represented by any model 
consisting of less than M components. Let c > and < d < 1. If c n = c exp (— n d ) < 
b m for all b m , then maximum likelihood estimator (which is restricted to Q n ) is strongly 
consistent. 

Proof of this theorem is given in Appendix [X] 

Note that under the assumption of theorem 13.11 the strong consistency holds even if 
the true model is unidentifiable in a non-trivial way. We illustrate the assumption of 
theorem 13.11 by examples of two-component models. If the true model is aU(x; 0, a) + 
(1 — a)U(x; a, 1) (see Titterington et. al. (1985) pp. 36) which is unidentifiable and can 
be represented by one component model, then the assumption of theorem Theorem 13.11 
is not satisfied. But if the true model is represented by \U(x\ —1, 1) + \U(x\ —2, 2) (see 
Everitt and Hand(1981) pp. 5), which is unidentifiable because \U(x; —2, l) + ^U(x; —1, 2) 
represents the same distribution, then the assumption of theorem Theorem 13 .11 is satisfied, 
because it can not be represented by one component model. 

Next proposition states that the rate of c n = exp(— n d ), d < 1, obtained in theorem 
13. II is almost the lower bound of the order of c n which maintains the consistency. 

Proposition 3.1. Ifc n decreases faster than exp(-n) , i.e., e n c n — > 0, then the consistency 
of maximum likelihood estimator restricted to n fails. 

Proof: By the strong law of large numbers, mean log likelihood of true model 
- logJ^jLi f(xf, 9o) converges to E[log f(x; 9o)] < oo almost everywhere. Assume that c n 
decrease faster than exp(— n). Take a\ = x±, b± = c n . Fix ct\ > and fix other parameters 
(a 2 , 7?2, • • - , aju, Vm) such that ^ Y!i=2 lo S {J2 m =2 a mf m {xi] Vm)} converges to a finite limit 
almost everywhere. Then 

M 



n 

i=l m=l 



^ n ( M 

, h = c n )} + - ^2 lo § \ ^2 am f'< 

i=2 Lm=l 

v ' 1=2 \m=2 



> -log{a 1 f 1 (x 1 ',a 1 = a- i .b i =''„)} J r-y >>g<{ ) /',„/,„(■'•,://„ 



Therefore mean log likelihood of the true model is dominated by that of other models and 
consistency of maximum likelihood estimator fails. □ 



4 Some discussions 

As stated above in Section^ the failure of consistency of maximum likelihood estima- 
tor is caused by the divergence of the likelihood of the model, where some scale parameters 
go to zero. Therefore in our setting it is of interest to investigate the behavior of the like- 
lihood of the models on the boundary {b rn = c n ) of the restricted parameter space Q n . 
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We report a simulation result for the case that the true model is 0.6 ■ f(x; 0.5, 0.5) + 0.4 • 
f(x; 0.6, 0.2) and a competing model is 0.6 • f(x; 0.5, 0.5) + 0.4 • f(x; a,b — c n ) which is 
on the boundary (b = c n ) of the restricted parameter space, where c n = exp(n -0 ' 93 ). The 

Table 1: log likelihood of the true model and that of a competing model 



sample size n 


log likelihood (true) 


log likelihood {b = c n ) 


10 


0.7767 


2.305 


50 


9.769 


11.38 


100 


15.61 


20.26 


500 


56.49 


67.11 


1000 


117.9 


104.7 


5000 


582.6 


199.3 



second column of Tabled shows the log likelihood at 9 n = 6q. The third column shows the 
log likelihood maximized with respect to a 6 [0, 1] (but b is taken to be c n ). In the com- 
peting model, with probability tending to 1, the length of the interval 2c n is shorter than 
the minimum of the distance between realized values. Therefore with probability tending 
to 1 the support of f(x; a,b = c n ) does not contain two or more realized values for all 
a G [0, 1]. Therefore the maximum of the likelihood is usually achieved when the support 
of f(x; a,b = c n ) contains just one realized value. Then f(x; a,b — c n ) = 0.6 + 0.4/ (2c n ) 
on one particular realization and f(x; a,b = c n ) = 0.6 on the other n — 1 realized val- 
ues. In this case the maximum of the log likelihood in competing model is given by 
log {0.6 + 0.4/(2c n )} + (n— 1) log{0.6}. The result in Table [TJ is based on one replication 
for each sample size. If we repeat the simulations, the results are similar. Therefore the 
result in Table Q indicates that the log likelihood of the true model gets larger than that 
of the competing models with b = c n as the sample size n increases. This simulation 
result is consistent with Theorem 13.11 

We expect that our result can be extended to other finite mixture cases, especially 
for densities which are Lipschitz continuous when the scale parameters are fixed. On the 
other hand, in Theorem 13. 1[ it might be difficult to weaken the assumption that there is 
no representation of the true model with less than M components. The problem studied 
in this paper is similar to the question stated in Hathaway(1985) which treats the normal 
mixtures and the constraint is imposed on the ratios of variances. Methods used in this 
paper may be useful to solve the question. 

A Appendix : Proof of the strong consistency 

Here we present a proof of Theorem 13.11 Note that it is sufficient to prove Theorem 
13. II for d arbitrarily close to 1. Therefore we assume d > 1/4 hereafter. 

The whole proof is long and we divide it into smaller steps. Intermediate results will 
be given in a series of lemmas. 
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Define 

®n = ®n I -^min < Va m < L max , C n < V6 m < L , C n < 36 m < Co}, 

T = {6> G | L min < a m < L max , c < b m < L , m = 1, . . . , M} . 

Because {c n } is decreasing to zero, by replacing Co by some c n if necessary, we can assume 
without loss of generality that T C T . 

In view of Theorems 12.11 12.21 for the strong consistency of MLE on n , by Lemma 
13.11 it suffices to prove that 

su Pe G 5'ue^ riLi f( x u d ) 
hm — — = 0, a.e. 



for all closed S' C r not intersecting T . Note that for all S' and {xj}f =1 , 

n ( n n 

sup TT f(xi, 9) = max < sup TT f{xi] 9) , sup TT /(a^; £ 
ees'ue> n f = \ l^'f=t T=i 

Furthermore equation ()2.3)1 with S replaced by S' holds by Theorem 12.11 This implies 
that it suffices to prove equation ()2.3j) with S replaced by 0^. 

Note that in the argument above the supremum of the likelihood function over S' U 0^ 
is considered separately for S' and 0^. S' and 0^ form a covering of S' U 0^. In our 
proof, we consider finer and finer finite coverings of Q' n . As above, it suffices to prove 
that the ratio of the supremum of the likelihood over each member of the covering to the 
likelihood at 9$ converges to zero almost everywhere. 

Let 9 G 0^. Let K = K(9) > 1 be the number of components which satisfy b m < cq. 
Without loss of generality, we can set bi < b 2 <•••<&«■< Co < bx+i < • • • < &m- Let 
K,K be 

Q' nK = {9eQ' n \b 1 <b 2 <---<b K <c < b K+1 <■■< b M } . 
Our first covering of 0^ is given by 



M 



K=l 



As above, it suffices to prove equation ()2.3|) with S replaced by Q' n K - We fix K from now 
on. Define 0^ by 



M 



®k = {(a K +i,a K+1 ,b K+1 , . . . ,a M ,a M ,b M ) G M 3(M K) | 2J " m < 1 , a m > , 

m=K+l 

L m in <dm< -^max , C < b m < L , TTl = K + 1, . . . , M} 
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and for 9 & Ok, define 

M 

f{x, 9) = ^ C^mfmiS^i Vm) , 
m=K+l 

f(x;9,p) = sup f(x;9') . 

dist(0,0')<P 

Note that f(x; 9) is a subprobability measure. 

Lemma A.l. Let B(9, p(9)) denote the open ball with center 9 and radius p(9). Then Ok 
can be covered by a finite number of balls B(6^\ p(9^)), . . . , B(9^ s \ p(9^)) such that 

(A.l) £ o [tog/M«,p(0«))] < E [\ogf(x;9 )} , s = l,...,S, 

where Eq[-] denotes the expectation under 9q. 

Proof: The proof is the same as in Wald (1949). For all 9 G Ok, there exists a positive 
real number p(9) which satisfies 

E Q [logf{x;9,p(9))) < E [logf(x;9 )}. 

Since Ok C [Jg B(9, p(9)) and is compact, there exists a finite number of balls 
B{9^,p{9^)), B(9~( s \p(9~W)) which cover Ok- □ 

Define 

®'n,K,s = {9 e 0' n K | (a K +i, a K +i, b K +x, ■ • • , %, a M , b M ) G B(9 (s) , p(6» (s) ))}. 
We now cover 0' n K by 0' n K1 , . . . , 0' n K S : 

s 

s=l 

Again it suffices to prove that for each s, s = 1, . . . , S, 

, A0 , .. su ^k. k , nr = i/(^;g) 

(A.2) hm — — = 0, a.e. 

n ^°° LLi=if(. x »0o) 

We fix s in addition to K from now on. 
Because 

1 n 

lim - Vjlog/fojflo) = E [\ogf(x;9 )}, a.e. 

i=l 

(IA.2J) is implied by 

1 

(A.3) limsup- sup V, log /(a?;; 6 1 ) < E [log /(x; O )], a.e. 

n^oo "a e e e' . ; 

n,K,s 1=1 
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Therefore it suffices to prove (jA.3|) . which is a new intermediate goal of our proof hereafter. 
Choose G, < G < 1, such that 

(A.4) A = E [\ogf(x; 6 )} - E [\og{f(x; 0« + G}\ > . 

Let u = max x f(x\ 9 ). Because {c n } is decreasing to zero, by replacing c by some c n 
if necessary, we can again assume without loss of generality that Cq is small enough to 
satisfy 

2c < e _1 , 

(A.5) 3M-u-2c - (-logG) < ^, 

(A.6) 2M-u-2c -log^- < — . 

Although G depends on cq, it can be shown that G and Co can be chosen small enough to 
satisfy these inequalities. We now prove the following lemma. 

Lemma A. 2. Let J{9) denote the support of J2m=i a mfm(x;ri m ) and let R n (V) denote 
the number of observations which belong to a set V C R. Then for 9 G 0^ K s 

1 n 1 n 

(A.7) -J>g/(x,;0) < -^log{/(x,;^),p(^))) + G} 

i=l i=l 

+- V \ogf(x l ;9) + -R n (J(9))-(-\ogG). 
n ' n 

Proof: For x £ J (9), f{x; 9) = J2Z = K+i a mfm{x',r]m)- Therefore 



n ( M 

0) = - E log/(x <; fl) + - ^ log <W« 

i=l x t £j(6) x t £J(8) Vm=K+l 

^ n f M 

i=l \m=K+l 



(xi, rj m ) + G 



+ 1 £ 



n 



M 



log f(xi] 9) - log < ^ a m f m (xi; r] m ) + G 

Lm=K+l 

^ l^og{f{x t -9^\p{9^)) + G} 

- V \ogf(x t ;9)--R n (J(9))\ogG. 



n 
1=1 



n 14 — ' n 



□ 
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We want to bound the terms on the right hand side of (jA.8|) from above. The first 
term is easy. In fact by (jA.4|) and the strong law of large numbers we have 



n 

(A.8) lim - Vlog{/(x i; 0« p(£M))+G} =E [logf(x;6 )]-\, a.e. 

n— +00 ft ' 4 



i=l 



Next we consider the third term. We prove the following lemma. 
Lemma A. 3. 

limsup sup — R n ( J{9)) < 3M ■ u • 2co, a.e. 
n->oo eeo' „ n 

n,K,s 

Proof: Let e > be arbitrarily fixed and let Jo be the support of the true density. 
Jq consists of at most M intervals. We divide Jo from L min to J max by short intervals 
of length 2co- In each right end of the intervals of Jo, overlap of two short intervals of 
length 2co is allowed and the right end of a short interval coincides with the right end 
of an interval of Jq. See Figure El Let k(co) be the number of short intervals and let 



h(co)h(co) 



'fc(co) 



(co) 



X 



2co 2cq 2co 



2c ■ 



2c 



2c 



2c 



Figure 3: Division of Jo by short intervals of length 2co- 

Ii(c ), . . . , ijfc(co)( c o) be the divided short intervals. Because J consists of at most M 
intervals, we have 

fc(co) < ^~ + M . 

ZCq 

Note that any interval in Jo of length 2co is covered by at most 3 small intervals from 
{Ji(c ), . . . , 4 (co )(c )}. Now consider J (9), the support of J2m=i a mfm(x; rj m ). The sup- 
port of each f m (x;r] m ), 1 < m < K, is an interval of length less than or equal to 2co- 
Therefore J(6) is covered by at most 3M short intervals. Then the following relation 
holds. 



(A.9) 



sup -R n (J(d)) - 3M ■ u ■ 2c > e 



Ape' n 



1 < 3k < fc(co) , -Rn{h{co)) - u ■ 2c > 

n SM 
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From (jA.lOjl . we have 

Prob | sup -RnUie)) -3M-u-2c > e 

< Prob ( -Rn{h{c,)) - u ■ 2c > ^- J . 

fc=l ^ n ' 

For any set l^cl, let Pq{V) denote the probability of V under the true density 

P (V)= [ f(x;6 )dx. 



JV 

Then 

(A.10) Po(4(c )) < u ■ 2co, k= l,...,k{6) . 

Since R n (V) ~ Bin(n, P (V)) and from (|2.4jl . we obtain 

Prob ( -R n (I k (c )) -u-2c > 



X u 3M 

< Prob ( -R n (I k (c )) - P (I k (co)) > 



< 



exp 



n v v u " v v " 3M 
2ne 2 



9M 2 ' 



Therefore 



Prob [°!Z, " RM0)) ~ 3M ' " ' 2co > £ ) - + M ) exp (~K> ■ 

When we sum this over n, the resulting series on the right converges. Hence by Borel- 
Cantelli, we have 

Prob ( sup -R n (J(6)) — 3M • u ■ 2cq > e i.o. I = 0. 

We' ^. n 



Because e > was arbitrary, we obtain 



limsup sup —R n (J(9))<3M-u-2co, a.e. 

n,K 7 s 



□ 



By this lemma and (jA.5|) we have 
(A.ll) limsup sup -R n (J{9)) ■ (-logG) < 3M • w • 2c • (-logG) < -. 
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This bounds the third term on the right hand side of (jA.8|) from above. 

Finally we bound the second term on the right hand side of (jA.8|) from above. This is 



the most difficult part of our proof. For x G J{9) write f(x; 9) = Y^m=i a mfm{x\ T] m ) as 



T(6) 

(A.12) f(x;6) = -Y,H{J t {6))l, h{e) {x) 

t=i 

where J t = Jt{9) are disjoint half-open intervals, lj t ^){x) is the indicator function, 

H(J t (9)) = f(x;9), xeJ t (6), 

is the height of f{x;9) on Jt(9) and T = T(8) is the number of the intervals Jt(9). 
Note that T{6) < 2M, because f(x;9) changes its height only at a m — b m or a m + b m , 
m = 1, . . . , M. For convenience we determine the order of t such that 

H(Ji(0)) < H(J 2 (0)) <■■■< H(J m (6)) . 

We now classify the intervals Jt(9), t = 1, . . . , T(9), by the height H(J t {9)). Define c' n by 

/ / l/4\ 

c n = c ■ exp {-n 1 ) 

and define r n {9) 

(A.13) r n (9) = max{t e {1, . . . , T} \ H(J t (9)) < ^-}. 

Then the second term on the right hand side of (jA.8|) is written as 

T{6) 



(A.14) 1 Yl l °Sf(^0) = E l0 S^( J *W) 

Xi£j(e) t=i Xidj t (e) 

T(0) 

-J2 Rn(MO))- log H(J t (9)) 



t=l 
T„(0) 



n 
t=l 



- flRnWe))- log H(J t (9)) 
1 

T{6) 

+- J2 R n (J t (9))-logH(J t (9)). 



t=r n (e)+i 

From ()A.5|) . ()A.6|) . and noting that logx/x is decreasing in x > e, we have 
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2c ' 



W) 2 2 M 

(A.15) 3 ' ~ l ogH(J t (9)) < 3-2M---(n d - log —) — 0. 

t=T„(0) + l 

Suppose that the following inequality holds. 

~T(0) 



(A. 16) limsup sup 



n-+oo 06©' 



Vi J R n (J t (0))log J Ff(J t (0)) 



Me) 



3 E 



it 



T«?) 



logff(Ji(0)) + £ -logff(J t (0)) 



t=r„(0)+l 



< 0, a.e. 



Then from (|A.14jl and (|A.15j) , the second term on the right hand side of (IA.8J) is bounded 
from above as 



(A. 17) limsup— sup log f(xi, 9) < — . 

n,K,s 

Combining (JOJ), (|A~TT| and (|A~T7j) we obtain 

1 n A A 

limsup sup -Jjlog/fojfl) < (E [logf(x; 9 )) - A) + - + - 



4 4 



< S o pog/(z;0 o )] - Tj' a - e - 



and (|A.3J) is satisfied. Therefore it suffices to prove (|A.17|) . which is a new goal of our 
proof. 

We now consider further finite covering of K s . Define 



n,K,s,T,T 



{0 G K,k, s I T{9) = T , r B (8) = r} . 



Then 



(A. 18) sup 

<?ee 



n,K,s 



T(9) 

Y^-R n {J t {9))\ogH{J t {9)) 
t=i 

(Tn(0) T(fl) 



< max max 

T=1,...,2M r=l,...,T 



sup 1 ^ -iUJ t (0)) logi*U(0)) - 3 ^ 

ee@ ' n ,K,s,T,T ^ t=l t=l 



H{Jt{9)) 



log H(J t (9)) 
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+ sup <f V -R n (J t (9))\ogH(J t (9))-3 V -\ogH(J t (9))\ 

0bKJ n,K,s,T,T U=T + 1 t=T + l J , 



Suppose that the following inequalities hold for all T and r. 



(A. 19) limsup sup 

n— >oo 6»G0; 



n.K.s.T.T 



y2-Rn(Jt(0))iogH(J t (6)) 

t=i 



^H{J t {9)) 



log H(J t (6)) 



< 0, a.e. 



lim sup sup 



n— >oo 660' 



K,s,T,t \_t= T +l 



T T 

V -i4W))logF(J t (0))-3 V -logff(J t (0)) 



t=T+l 



< 0, a.e. 



(A.20) 



Then ifPTjl is derived from (fOsjl . (fA~T9jl . (fX"20j) . Therefore it suffices to prove ifATTOjl 
and (|A.20J) . which are the final goals of our proof. We state (|A.19J) and (|A.20|) as two 
lemmas and give their proofs. 



Lemma A. 4. 



lim sup sup 



n,K,3,T,T 



T T 

-Rn(Jt(e))logH(J t (6))-3 -togHWO)) 



t=T+l 



t=T+l 



< a.e. 



Proof: Let 5 > be any fixed positive real constant and let a' t {9) denote the middle 
point of Jt(9). Here, we consider the probability of the event that 



(A.21) sup 

eee' „ 



T T 

V -R n {J t {9))logH{J t {9))-3 V - log H(J t (6)) 



t=T+l 



t=T+l 



> 1Mb. 



Noting that for t > r, the length of Jt(#) is less than or equal to 2c' n , the following relation 
holds for this event. 

The event flA.21|) occurs. 



sup 



eee' 



n,K,8,T,T 



max [o, U-R n {[a' t {9) - c' n , a' t (9) + c' n }) 

t=T+l ^ ^ 



2 \\ 1q M ' 

n ) | ° S 2c„ 



> 2M5 



39 e Ol 



n,K,s,T,Ti 



3t>T 



max { 0, ( ^R n ([a' t (9) - <, a' t (9) + c' n \) - 3 • |) | log ^ > 5 
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39 e G' 



3t> r 



(A.22) 



'n,K,s,T,Tt 

R n ([a' t (6)-c' nl a' t (6) + c' n })>Q 

sup R n (W ~ c' n , a' + c' n ]) > 6 . 



Below, we consider the probability of the event that (jA.22|) occurs. We divide Jo from 
L min to L max by short intervals of length 2c' n as in the proof of Lemma lA. 31 Let k(c' n ) be 
the number of short intervals and let /i(c^), . . . , Ik(c' n ){.c' n ) be the divided short intervals. 
Because Jq consists of at most M intervals, we have 



(A.23) 



L 



k(c' n )< — + M. 



Since any interval in Jo of length 2c' n is covered by at most 3 small intervals from 
{Ii(c' n ), . . . , Ik( c ' n )(c' n )} , the following relation holds. 

(A.24) sup Rn([a' - c' n , a' + c'J) > 6 1 < Bk < k(c' n ) , R n {h{d n )) > 2 . 

^min ^Imax 

Note t hat R n {I k {c ' n )) ~ Bin(n, P (4«))) and P {I k {c' n )) < 2c' n u. Therefore from (fA~22j) . 
(lA~23"j) and (lA~2lj) we have 

Prob [sup I -Rn(Jt(0))logH(J t (6))-3 -logH(J t (9))\ > 2Mb J 

\ eG0 n,A', S ,T,T tt=T + l t=T + l ) J 

s (4 + M ) £ (*) (2c »" ) ' (1 " 2c>r " 

V n 7 fe=2 



< 



2d, 



y + M) {2nd n uf exp (2nc» . 



When we sum this over n, resulting series on the right converges. Hence by Borel-Cantelli 
and the fact that 5 > was arbitrary, we obtain 



lim sup sup 

n-*oo eeo' „ „ 



T T 

V - J R„(J(^))log/J(J(0))-3 V - log H(J t (6)) 

Z — < n Z — < n 



t=T+l 



t=T+l 



< a.e. 



□ 



Finally we prove (jA.19|) . 
Lemma A. 5. 



lim sup sup 

n^oo flee' T 



£ ^R n {J t {9)) log JJ(J t (0)) - 3 hTJW) ^ H (Ud)) 
t=i t=i \ ^ 



< a.e. 
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Proof: Let 5 > be any fixed positive real constant and let h n be 
(A.25) *. 3 ^{ ol0 gg)J \ 

We divide [c' n /M, c ] from c to c' n /M by short intervals of length h n . In the left end c' n /M 
of the interval [d n /M 1 Cq], overlap of two short intervals of length fa n is allowed and the 
left end of a short interval is equal to d n jM. Let l n be the number of short intervals of 
length h n and define b\ n ' by 

b (n) _ f C - - 1)^: 1 < ^ < ^n, 

1 1</Af, Z = Z n + l. 

Then we have 

(A.26) l n < ^ + 1 . 

Next, we consider the probability of the event that 



(A. 27) sup 
For this event the following relation holds. 



> 2M5. 



The event (1A.27J) occurs. 
=> ^ G e; >KjS)TjT , 1 < 3/(1), • • • , 3/(r) < l n s.t. 

E max {o, f ii^d^w - 6{g, oJW + © - 3w • | log rrrir- > 2M6 



fl ------- iv";' ■ b[ .. u r' tyuj-rj. I I ~ o/ 1 ( ra ) 

i=l v x / J Z0 /(t)+l 

3^ e e;^ )SiT)T , i < at < r , 1 < 3z(t) < i n s.t. 



9h y > < < 2b 



i(ty 



max {o, ~ b^a'M + 10) - 3u ■ 2b$ +1 ) ) log — L- 



> 5 

i(ty 



K3KL s.t. 



max jo, sup ( -R n ([a' - b\ n) ,a' + b\ n) }) - 3u ■ 2b\l[) \ log — > 5 
{ i min <a'<L max \n J J 2b\l > 1 



KBKL s.t. 



sup 

^min^^ ^max 



( (±R n ([a' - bt\ a' + &W]) - 3 M • 2&K) log 
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+ 3u(2b^-2bH\)log-^ T \>5 
(A.28) 

Then from (|A.25j) the following relation holds. 
The event (Qj occurs. 



sup - (R n ([a' - bl n \a' + b\ n) }) - 3u ■ 2bf) log -L- > 6 



min <a'<L max ^ ^ 26jji 2 



(A.29) 

Below, we consider the probability of the event that ()A.29|) occurs. We divide J from 
L mm to L max by short intervals of length 2b[ n ^ as in the proof of Lemma IA. 31 Let k(b[ n ^) 
be the number of short intervals and let J^&j ),..., I fc „(»).(6j) be the divided short 
intervals. Then we have 

(A.30) k{bf) <^ + M. 

Since any interval in Jo of length 2b\ is covered by at most 3 small intervals from 
{Ixipi^), . . . , I k ^ b (n)^(b\ n ^)}, the following relation holds. 



sup 

^min ^max 



- t\ «' + &h) - 3- ■ 2&{»>) > 5 - (log ^ 



1 

(A.31) =► max (^(J^)) - « ■ 2b^) > \ ■ f ( log * 

Note that R n (I k (b\ n) )) ~ Bin(n, P (4(6[ n) ))) and P (4(>! n) )) < « • 2&{ n) . Therefore from 
(|2~U) and fOoj) we have 



Prob I max 

k=i,...,k(b\ n) ) 



s (^ +M ) exp {- 2n -s( iog i 



(A.32) <^_ + Mjexp|- 2n .-^lo g - 

From (|A~26|) . (|A~28|l . (lA~29l) . (|A~3T| . (|A~32|l . we obtain 



1 ^ 2 



Prob sup 



see' „ „ 



£ ij2„(J t (fl)) ]ogH(J t (0)) - 3 g log#(J t (0)) 



> 2M<5 
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c 



( L 



V2d 



M/f + 1 J(^7 + M J ex Pi-^-^( lo g^7 



36 



2d 



When we sum this over n, the resulting series on the right converges. Hence by Borel- 
Cantelli and the fact that 5 > is arbitrary, we have 



lim sup sup 



n— >oo 8e&' 



n,K,a,T,T 



^Rn(Me)) logH{J t {9)) - 3 £ jj^j logH(J t (9)) 



< a.e. 

□ 



This completes the proof of theorem 13.11 
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